AITopics | video encoder

Collaborating Authors

video encoder

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

522a9ae9a99880d39e5daec35375e999-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 22:17:08 GMT

artificial intelligence, machine learning, video encoder, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

40b60852a4abdaa696b5a1a78da34635-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-10-2026, 22:46:47 GMT

encoder, mavil, video, (16 more...)

Neural Information Processing Systems

Country:

Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Europe > Germany > Saxony > Dresden (0.04)
Europe > Czechia > South Moravian Region > Brno (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

Low-FidelityVideoEncoderOptimizationfor TemporalActionLocalization

Neural Information Processing SystemsFeb-8-2026, 16:35:07 GMT

This is why the two-stage optimization pipeline as described above becomes the most common andfeasible choice inpracticeforoptimizing aTALmodel.

artificial intelligence, machine learning, video encoder, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Do Blind Spots Matter for Word-Referent Mapping? A Computational Study with Infant Egocentric Video

Shi, Zekai, Cai, Zhixi, Stefanov, Kalin

arXiv.org Artificial IntelligenceNov-18-2025

Typically, children start to learn their first words between 6 and 9 months, linking spoken utterances to their visual referents. Without prior knowledge, a word encountered for the first time can be interpreted in countless ways; it might refer to any of the objects in the environment, their components, or attributes. Using longitudinal, egocentric, and ecologically valid data from the experience of one child, in this work, we propose a self-supervised and biologically plausible strategy to learn strong visual representations. Our masked autoencoder-based visual backbone incorporates knowledge about the blind spot in human eyes to define a novel masking strategy. This mask and reconstruct approach attempts to mimic the way the human brain fills the gaps in the eyes' field of view. This represents a significant shift from standard random masking strategies, which are difficult to justify from a biological perspective. The pre-trained encoder is utilized in a contrastive learning-based video-text model capable of acquiring word-referent mappings. Extensive evaluation suggests that the proposed biologically plausible masking strategy is at least as effective as random masking for learning word-referent mappings from cross-situational and temporally extended episodes.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2511.11725

Genre: Research Report (0.82)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

522a9ae9a99880d39e5daec35375e999-Supplemental.pdf

Neural Information Processing SystemsNov-14-2025, 03:38:58 GMT

artificial intelligence, machine learning, video encoder, (17 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)
Asia > Middle East > Saudi Arabia (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Video CLIP Model for Multi-View Echocardiography Interpretation

Takizawa, Ryo, Kodera, Satoshi, Kabayama, Tempei, Matsuoka, Ryo, Ando, Yuta, Nakamura, Yuto, Settai, Haruki, Takeda, Norihiko

arXiv.org Artificial IntelligenceNov-11-2025

Echocardiography records ultrasound videos of the heart, enabling clinicians to assess cardiac function. Recent advances in large-scale vision-language models (VLMs) have spurred interest in automating echocardiographic interpretation. However, most existing medical VLMs rely on single-frame (image) inputs, which can reduce diagnostic accuracy for conditions identifiable only through cardiac motion. In addition, echocardiographic videos are captured from multiple views, each varying in suitability for detecting specific conditions. Leveraging multiple views may therefore improve diagnostic performance. We developed a video-language model that processes full video sequences from five standard views, trained on 60,747 echocardiographic video-report pairs. We evaluated the gains in retrieval performance from video input and multi-view support, including the contributions of various pretrained models. Code and model weights are available at https://github.com/UTcardiology/video-echo-clip

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2504.188

Country: Asia > Japan > Honshū (0.15)

Genre: Research Report (0.84)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.54)

Add feedback

40b60852a4abdaa696b5a1a78da34635-Supplemental-Conference.pdf

Neural Information Processing SystemsOct-8-2025, 13:18:24 GMT

artificial intelligence, encoder, machine learning, (18 more...)

Neural Information Processing Systems

Country:

Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Europe > Germany > Saxony > Dresden (0.04)
Europe > Czechia > South Moravian Region > Brno (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning Y uchong Sun

Neural Information Processing SystemsAug-19-2025, 20:57:35 GMT

Large-scale video-language pre-training has shown significant improvement in video-language understanding tasks. Previous studies of video-language pre-training mainly focus on short-form videos (i.e., within 30 seconds) and sentences,

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

Asia > China > Beijing > Beijing (0.04)
Asia > China > Anhui Province > Hefei (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)

Add feedback

Improving Out-of-distribution Human Activity Recognition via IMU-Video Cross-modal Representation Learning

Cheshmi, Seyyed Saeid, Lyu, Buyao, Lisko, Thomas, Rajamani, Rajesh, McGovern, Robert A., Varatharajah, Yogatheesan

arXiv.org Artificial IntelligenceJul-21-2025

Human Activity Recognition (HAR) based on wearable inertial sensors plays a critical role in remote health monitoring. In patients with movement disorders, the ability to detect abnormal patient movements in their home environments can enable continuous optimization of treatments and help alert caretakers as needed. Machine learning approaches have been proposed for HAR tasks using Inertial Measurement Unit (IMU) data; however, most rely on application-specific labels and lack generalizability to data collected in different environments or populations. To address this limitation, we propose a new cross-modal self-supervised pretraining approach to learn representations from large-sale unlabeled IMU-video data and demonstrate improved generalizability in HAR tasks on out of distribution (OOD) IMU datasets, including a dataset collected from patients with Parkinson's disease. Specifically, our results indicate that the proposed cross-modal pre-training approach outperforms the current state-of-the-art IMU-video pretraining approach and IMU-only pretraining under zero-shot and few-shot evaluations. Broadly, our study provides evidence that in highly dynamic data modalities, such as IMU signals, cross-modal pretraining may be a useful tool to learn generalizable data representations.

artificial intelligence, deep learning, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2507.13482

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.30)

Genre: Research Report > New Finding (1.00)

Industry: